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1. Introduction 

Our aim in this article is to establish eonsistency criteria for the method of 
penalized maximum likelihood estimation for the number of states in a hidden 
Markov chain in an AR-MR process. We show strong consistency for an estima- 
tor of the number of states in autoregressive process with Markov regime when 
the regression functions arc linear and the noise is Gaussian. 

Autoregressive processes with Markov regime can be looked at as a combina- 
tion of hidden Markov models (HMM) with threshold regression models. These 
have been introduced in an econometric context by Goldfeld and Quandt [12] 
and they have become quite popular in the literature ever since Hamilton [13] 
employed them in the analysis of the gross internal product of the USA for two 
regimes: one of contraction and another of expansion. 
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When the number of states in a hidden Markov chain is known a priori the 
estimation problems can be solved, in principle, through the use of techniques 
based on maximum likelihood estimation (see, McDonald and Zucchini [17] and 
Cappe et al. [2]). But in many applications, a key problem is to determine the 
number of states in a way such that the data is adequately described while at 
the same time a compromise is maintained between fitness and possibility of 
generalizing the model. The problem of estimating the hidden Markov chain 
in AR-MR is a typical example of a nested-family of models: models with m 
parameters can be seen also as models with m + 1 parameters. Thus the problem 
of model selection is essentially that of determining the smallest model that 
contains the distribution capable of generating the data. In many instances, the 
estimation of the model will depend on how identifiability affects the model and 
not on the specification of the correct model. 

A first approximation to determine the dimension of the model is a statistical 
test based on the likelihood ratio (see Dachuna and Dufio [5], p. 227). For the 
estimation of the number of state in hidden Markov chain, the likelihood ratio 
test fails because regularity assumptions do not hold. In particular, the model is 
not identifiable, as some parameters do not show up under the null hypothesis 
and the information matrix is singular. As a result the asymptotic distribution 
of the likelihood ratio is not x^- As an alternative, one can construct generalized 
tests for the likelihood ratio that would hold under non-standard conditions. For 
the problem of the determination of the number of states in AR-MR, Hansen 
[14] has proposed a test that works with loss of identifiability but in order 
to implement it one needs to calculate p-values in an approximate way; this 
leads to computationally heavy calculations which produce approximate p- values 
which underestimate the real ones. Garcia [8] has advanced more attractive 
computational alternatives which lack, however, the technical rigor present in 
Hansen's approach. 

For HMM models the likelihood ratio test is not bounded. Gassiat and 
Keribin have studied it [11] and have shown that it diverges to infinity. The 
rate of growth of the likelihood ratio as the parameters increase is related to the 
complexity of the model. This brings us to consider penalized estimators of the 
likelihood function that compensate the lack of likeness between models with 
different dimensions. The specification of small penalties depends on how the 
divergence rate at infinitum of the likelihood ratio is determined. But as far as 
we know this is still an open problem for HMM models where the data belongs 
to infinite sets. 

In general, criteria for penalized likelihood are obtained through approxima- 
tions to KuUback-Leibler divergence. Among others, we find the very popular 
information criteria of Akaike (AIC) and the Bayesian one (BIG). These have 
been used by several authors in applications of the HMM models, however, 
as is mentioned by McDonald and Zucchini [!~], these authors have made no 
reference as to their validity. 

We shall distinguish two cases, regarding whether or not the observed vari- 
ables are in an infinite set. For the case of the HMM model with data belonging 
to a finite set much work has been done starting with Finesso's presentation of 
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the problem [7] where he estabhshes the strong consistency for the pcnahzed 
estimator of the number of states assuming that the actual number of states 
belongs to bounded set of integers. Liu y Narayan [Ki], assuming this restric- 
tion introduce a strongly-consistent estimator based on statistical mixtures of 
the Krischevsky-Trofimov type as this allows to normalize the likelihood so as 
to control the growth of likelihood when the number of states is increased. In 
studies dealing with the efficiency of their estimator they show specifically that 
the probability for underestimating decreases at an exponential rate with the 
sample size, whereas the probability for overestimating does not exceed a third 
degree polynomial of the size of the sample. Based on this former work, Gassiat 
and Boucheron [10] have introduced considerable advances: they proved the 
strong consistency of the penalized estimator without assuming a priori upper 
bounds for the number of states; in addition, they showed that the probabili- 
ties for underestimating as well as for overestimating fall at an exponential rate 
with sample size. For AR-MR processes with observations belonging to a finite 
set the techniques introduced by Gassiat and Boucheron were further used by 
Chambaz and Matias [4] to simultaneously show the consistency of the number 
of states of the hidden chain and the memory of the observed process. 

For the non-finite case in HMM models Rydcn [19] have shown consistency 
for a penalized likelihood estimator which in the limit does nor underesti- 
mate the number of states. Dortet-Bernadet [G] have shown that under cer- 
tain regularity conditions the Ryden estimator is indeed consistent. Gassiat [9] 
studying a penalized estimator of marginal likelihood concludes that there is 
consistency in probability with the actual number of states. This technique is 
extended by Olteanu and Rynkiewicz [18] in order to select the number of regres- 
sion functions in processes where the regime is controlled by an independent se- 
quence. In this very same work, the authors indicate that the penalized marginal 
likelihood criterion cannot be directly applied to AR-MR. Smith et al. [21] have 
advanced a new information criterion in order to be able to approximate the 
KuUback-Leibler divergence and to select the numbers of states and the variables 
in AR-MR. This criterion imposes a penalty that reduces state number overes- 
timation. Following the work on finite alphabets in Ref. [7, 16, 10], Chambaz et 
al. [3] have shown strong consistency for penalized and Bayesian estimators of 
the number of states in HMM and observations belonging to infinite (discrete 
and continuum) sets; they have worked with conditionally Poisson and Gaussian 
distribution. As in the previous works, no a priori bounds are assumed for the 
number of states. 

Following Chambaz et al. [.3] we prove a mixture-type inequality (see Section 
2.1) that allows us to normalize the likelihood and in addition we also prove 
in Section 3, without assuming a priori bounds on the actual state number 
of the hidden Markov chain, that the penalized estimator underestimates. In 
order to show that the penalized estimator does not overestimate the number 
of states, we use an approach that works well for nesting models and which is 
based on the equicontinuity of the likelihood function. We would like to point 
out that our results are obtained for the linear case and that they can be easily 
generalized to the nonlinear case if we assume that a sublinearity hypothesis such 
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as the one required by Yao y Attali [2'2] holds, albeit retaining the assumption 
of Gaussian-like behavior. 



2. Definitions and introductory comments 

A linear autoregressive process with Markov regime (AR-MR) is defined by: 

Yn = ax,yn~i + &.Y„ + crA',.e„ (2.1) 

where {e„} are i.i.d. random variables, af is the variance of the model in each 
regime and a^ = (ct^, . . . jC^n)- The sequence {X„} is a homogeneous Markov 
chain with state space {1, . . . ,to}. We denote by A its transition matrix A = 
[aij\. For each 1 < i < m we have 6i — {hi, cti)* and 

Q ^ ( ^^ ^2 ■ ■ ■ brr, 

\^ ai a2 ■ ■ ■ a„ 

We assume that: 

51 The Markov chain {A'„} is recurrent positive. Hence, it can have an in- 
variant distribution that we denote by A = (Ai, . . . , Am)- 

52 Yoj the Markov chain {Xn} and the sequence {e„} are mutually indepen- 
dent. 

53 The e„ has Gaussian distribution A/'(0, 1). 

54 EA(loga) = X]i=i '^ilog(ai) < (stability condition). 

55 The parameter 9i belongs to the compact subset 9i C R^. 

56 For each 1 < i < jti, af e [c, d], c> 0. 

The parameter space is the set 



*m ^\i^ = {e,a\A):ee<^ e,, a^ e [c, dr, J2 «y = 

i=l j=l 

Notations 

• V" stands for random vector (Vi, . . . , VnY and v" = {vi, . . . , VnY for any 
realization. 

• The symbol 1b (x) denotes the function which assigns the value 1 if a; G -B 
and elsewhere. 

• Distributions and densities are denote by p. 

For each I < i < m, 

• Let Tii ~ X]fc=i -"-jX^fc) ^'^ the number of visits of a realization of the 
Markov chain {Xn} to state i in the first n steps, n^ ~ X)fe=i ^iJ i^k-i, Xk) 
is the number of transitions from i to j in n steps. 

• Let li := {k < n : Xk = i} = {ki., . . . , /sn-}. 
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• Let 

Y/. := (n,^,...,Ffe„J* 

^h-^ :- (Ffc,^-i,...,y;.„,-ir 
Ej := {efci.,...,efc„J. 

• The symbol 1^ denotes a ni-dimensional column vector with 1 in all of its 
positions and W; = [1^, Y/J. 

The process {^n} in general is not a Markov chain but the associated process 
{(y„, Xn)} is a Markov chain with state space M x {!,..., m}. In what follows 
we introduce some properties- to be used throughout this work- related to the 
likelihood function for the present model. 

3. The likelihood function 

We consider the conditional distribution p^{Y^\Yo = j/o) as the likelihood func- 
tion for a set of observations Y^ = j/q and parameter ip. Because of the total 
probability rule the total, likelihood function for the model is given by: 

P^iYriYo = yo) 
= E^^Ain", ^1 l>o - yo) = Y.Po,-'(^i'\^o - yo, x^)pa{x",). (3.1) 

Using our above notation wc may represent the AR-MR process defined by 
Eq. (2.1) by means of its m linear models, for each I < i < m 

Yj. = Wi9, + cr.Ej Vi < m. (3.2) 

Thus, the distribution of yj^ conditional to .t" is written as 

P^,{Yr\Yo = yo, <) = n TTT"^^ ^^P -^^C^h W.0,)'(Y/. - W.^,) . 
We assume that prior distribution p{ijj) on $ satisfies 

rn 

p{4,) = p{A)p{e\<j')pia^) = l[p{AMO^W■)pi<^■), 

where Ai denotes the i-th row of A. Due to (3.2) we will consider the prior 
distribution for {6,a'^) belonging to a Gaussian-Gamma family (see Broemiling 
[1], §1, page. 3), means for each i = 1 . . . , m, 

HI 01, ... , 9m are independent with 
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H2 ai, . . . , af^ arc independent with Inverse-Gamma distribution 

H3 ^1, . . . , Ara are independent with Ai ^ ^{ei), where V denotes a Dirichlet 
density the parameters vector (1/2, . . . , 1/2), 

r(m/2) -Pf _i/2 

^^"'^^ r(i/2)"' ii"'^- • 

The related mixture statistic is defined by 

qmiYD - / pA^riYo - J/o)p(^)rf^- 

The main results of this section is the comparison between the likelihood 
function and the mixture statistics. 

Under the assumptions (S1-S6) and (H1-H3) described before we have the 
following theorem. 

Theorem 3.1. For each m > I and the prior distribution p{ijj) satisfies the 
inequality 



^^^P^YrlYo ^ yo) 



qm{Y{^) 



< n log('^) + Crn(") + d{n) + -^ log ^t' ^ ^ _ + e^{n), 



where 

YJP.Y,, Y?;P.Y,. 



= max 



Y*^BfcY,, »=i,...,m Y|B,Y,, 
and for each n > A, 

T(m/2) m{m — 1) 1 

r(l/2) A^ ^12^ 

mlog(27r) I 



Cm{Tt) = max < 0, log?n — m [ log 



e-m{n) ~ max 



{»^?>"^(^ + ^f<A.".)^)- 



P, = I - W,M,Wf 
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4. Penalized estimation of the number of states 

The purpose of this Section is to advance an estimation method based on pena- 
lized maximum likeUhood in order to select the number of states m of a hidden 
Markov chain {X„}. For every integer m > 1, we consider the sets VP™ and 
A4 = Um>i ^"i ^^^ family for all models, (with convention $o = 0)- We define 
the number of states mg through the property 

PV^o S iP^ ■ ^' ^ *"o} n {pi, : ^ e "i-rna-lV- (4.1) 

Remark: (Identifiability) Wc assume that for the true model 'i'ma the vec- 
tor components {{cti,bi,ai)}"!^\ are different; thus, for every n, there exists a 
point Yn-i € M such that {{aiYn-i + bi,(Ji)}"^\ are different. Therefore, in 
agreement with Remark 2.10 of Krishnamurthy and Yin [15] the model is iden- 
tifiable in the following sense: If K stands for the KuUback-Leibler divergence 
K(ip,iprno) = then, ijj = ipmo- As a result, identifiability implies that ttiq - 
defined by Eq. (4.1) - is unique. 

Let pen(n, m) be a penalty term which is given by a positive function with 
increasing values of n and m. We define the estimator for penalized maximum 
likelihood as (PML) for toq as, 

m{n) = argminj - sup \ogPti,{Y"\YQ = j/o) +pen{n, m) \. (4.2) 

We say that m{n) overestimates the number of states t/iq if fh{n) > toq and 
that it underestimates the number of states if fh(n) < mp. 

In the following theorem we prove that the estimator PML for ttiq, overesti- 
mates the number of states. 

Theorem 4.1. Assume (Bl-SBj and that lim„^oo ^'^" " = V rn. Then 

fh{n) > mg. a.s. 
In order to prove this Theorem, the following two Lemmas are necessary: 

Lemma 4.1 (Finesso [7]). Assume CS1-S6J the set of functions /,i(^) = ^ logp^{Y{^\Yo 
yo) is an equicontinuos sequence a.s-¥^,g. 

The following result is a usual one in the context of order selection for a nested 
family of models, see [2], §15, p 577-578. For HMM models, similar results are 
given, for example, in [10, 3]. 

Lemma 4.2. Assume (S1-S6J we have: 

1. For each m > I, ip^ipQ €z ^„i there exist K{i}),^q) < oo such that: 

lim [\ogp^„{Y,"\Yo = yo) - \ogp^,iY{'\Yo = yo)] - if (V, V'o). 

2. For each t/j G ^„,„ n *,„„-i^ 

min inf K{'ipmo,i^) > 

m<rno ^^'^rn 
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3. For each i/j G ^m, y" <E M" there exists i — 1, . . . Ie,m, 
\ogp^^{Y[^\Yo = yo) - logp^{Y[^\Yo = yo) 



<£. 



In the following theorem we prove that the estimator fh underestimates the 
number of states toq. 

Theorem 4.2. Assume fSl-SBj and (H1-H3J. Set p>2, and for each n > 4, 

771 > 1 

m 1/7 I 1 \ , ra m 

pen{n, m) = \ logri + ^^ q('t-) + / e;(n) + m(?n + l)4>{n) logri, 

1=1 1=1 1=1 

where (f>{n) = o{n). Then, for each m < niQ it holds that fh„i < ttiq a.s — P^^q. 

5. Proofs 

Proof of Theorem 3.1. 

We observe that 

- Y.I I [ P0,.-{yr\yo = ya,x'i)PA{x^)p{A)p{e)p{a')dAdeda^ 

,.„ Je Jt, Jv 

= Y.I I MYi\yo = yo,x'l)p{e)deda^ f pAix",)p{A)dA 
^„ Je Je Jv 

= 5]<?™(n"l^o = yo,Og™K). (5.1) 

Hence, the Theorem can be proved by finding constants Ci, C2 such that: 

PeiY.^lYo = yo) < C\q„^{Y{'\Yo = yo,x",) (5.2) 

Pa{x'1) < C2q,n{x'l). (5.3) 

Thus, taking into account equations (5.1) and (3.1) 

p.^{Y{^\Yo = yo) = ^Pe,.= (n"l>^o - yo,x^)pA{x'i) 

x1 

< ClC2Y,^r,^{Y^\x^)q„^{x^) 
x^ 

= ClC2qm{Yn- 

Let us evaluate g„i(x") following the proof given in the Appendix of Ref. [16]. 
Consider 
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and 



imix^i) = n 



< 



r(m/2) / " r(n,,- + 1/2) 

r(n, + 1/2) l^lj r(i/2) ^ 

nr=in;Li(^)"'^ 



r(Tn/2) /-prm r(n.j + l/2) 
r(ni + l/2) Uli=l r(l/2) 



9m (a;") J]™^ 
The right-hand-side of equation (5.4) docs not exceed 

"r(n-fm/2)r(l/2) 



(5.4) 



_r(m/2)r(7i+l/2) 
Gassiat and Bouchcron [Hi] showed that 



mlog 



r(n + m/2)r(l/2) 



r7i(m — 1) , , , 

< :z log n + Cm [n) , 



_r{m/2)T{n + l/2) 
for n > 4, Cm(n) one selects: 

r(7Ti/2) m(rn — 1) 1 



log m — m [ log 



r(i/2) 



An 12n 



It follows: 



PAi^l) ^ ^m(m-l)/2gC„(n)^ 



(5.5) 



9m (a;?) 

What remains is to evaluate the quotient between pg g.2(Y]"|lo = yQ,Xi,9,a'^) 
and qm{Yi'\Yo = yo,Xi). Let us start with the evaluation of g,„. 

gm(yi"|>^ = 2/o,:E?) 

(-^(Y/;-Wiei)'(Yj^-w,e,)) 



[](27raf)-"*/2e^ ^ 

2 — 1 



Tivo/2) ' ' ^^^^^'■ 



27rr2fT2 



As a result of the evaluation of the mixture, upon integration over the variables 
y c^^ is: 

9m(yi"|lo== 2/0,0 



n (2^)n.^2 UJ r(«o/2) ^ '■ * '' °^ ^ 2 



Now, setting uq — ^ and wq — > (which means that in the limit we consider a 
priori distributions which are not informative for a^ although they are improper) 



/^niv "N TT v/lct(M^y2^ „./2 

qrn{Yi \Yo = yo,a;i) = [[ . .„^ , (Y^^P^Y/J r(n,;/2). 
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Again, introducing conditions with respect to Yj" — Ui ^^'^ ^i i ^s the model 
is both hnear and Gaussian, the estimators ML for I < i < m are 

0, = (W*W,)-'W*Y/, 
df = -{Y'j^Yj,-9lWlYj,) 

Taking into account that pg,y2{Y{'-\YQ = yo,Xi) < pg g.2{Y{^\Yo ~ yQ,Xi) and 
that the right-hand-side of the inequahty satisfies 

m 

i=l 
m 

i=l 

Wc arrive at the foUowing expression for the density quotient: 

ii/2 



iYr\yo,x^) ^ " <-/V'/2 iYlP^Y 



< 



nifTiF^ v^^ -Vdet(M-.) 



Taking logarithms of both sides of the inequality, we have that: 

1 Pe,a<Y{^\Yo = yo,x^l) ™ ^ a ^ V^ "m Y^T^Yz 



^logrYdct(M: 



r-l\ 



i=l 

= T1+T2 + T3 (say.) 

Let us notice that the right-hand side of the former inequality satisfies the 
following bounds: For term Ti we have 

El n"'' tt"'/^ \ n 1 /n\ mlog(27r) 
^^/"g( e-W(V2) J^2 + 2^"g^ ' 

For term T2 



V!^logIl£I^ < I^logllZl^ 
^2 ^Y*^B,Y,. - 2 ^Y*^B,Yj, ' 

and for term T3 

r^det(M-i) = l + r^n,^y^,-r4(^n_i)'+r2 + r^^y,_i 



R. Rios and L. Rodriguez/ Estimation of the number of states for AR-RM 1121 

we write the first term of the inequality 



5]logTydet(Mri) 

m 



'\ 



kGli 



l + r^n,Y,Yl,-r'^iJ2Y,^A + r^ + r^ ^ y,_i 



fce/i 



fee/i 



^\ogy/ITvi (say.) 



Making use both of convexity and the Ergodie Theorem we see that the 
following relation is satisfied: 



m / _. m \ ™/^ / 4 2™ 

^log Vi + vi < log 1 + - 5] y. = log 1 + ^ 5I(^»'^') 



Tn/2 



Substituting the calculated bounds 
Ps,„2{Y{^\Yo = yo,xf) 



log- 



g™(n"|i'o = yo,a:?) 



< -y aog(2) + log(27r) + log(n)) + ^— n + - log ^^^ ^ 



log 1 



m 



■Y.^\<^^)' 



m/2 



D 



Proof of Lemma 4.1. 



We work directly with the extended Markov chain {(Kn,X„)}. Let hi^ji) = 
- logp^(Fo"i 2:") and let ^, ■0' G ^- We prove that for each e > there exists a 
5{e) > such that: 

Vn \h^W ~ h^{^')\ < e si \\^~^'\\<S{e). 

Complete likelihood is written as 



P^W.a^i 



n rn 



fc=i i j=i 



,li,i(a;fc,a;fc+i) 



-^i (27ra2)»./2 

i— 1 ^ ^ ^ 



v-i^(Yi.-w.eo*(Y,,-w.e.)) 
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from where it ensures that 



_. ni 
< - ^ Tlijl log Qij -l0g( 



i,j=i 



1 

2n 



^njlogcr,^- log cr| 



E 



1 

2^ 



2a^ 



^i^h 



Enw. 



|(M)^-^-<M) 



Ti + Ta + Tg + T4 + T5 



(say.) 



(5.6) 



The right-hand-side of the inequahty (5.6) can be bounded in the following way 

• Since riij/n < 1, rii/n < 1 and the parameters aij ,aij , af , af are lower 
bounded (for S6), there exist a constant Ci such that the terms Ti and 
T2 of (5.6) are upper bounded by Ci\\^p — ^p\\. 

• Due to compactness of the parameters space (S6), there exist a constant 

C2 such that termTg of (5.6) are upper bounded by C2||cr^ — f^ll" Sfe=i ^fc- 
The stability condition (S4), and the existence of the moments of ei (S3) 
implies (see Yao and Atalli [22]), by the Ergodic Theorem, that the terms 
of the l/n-X]r=i dO^k) are controlled. Hence 

1 



(^2lk'-^2||_Vyfe<C3||v^-^|| a.s. 

n ^ — ^ 



k=l 



By the same argument of compactness (S5 and S6) we have 



Ti<C4U'-i^\\ 



-E^^+'EE^'^^fc-i 



fc=l 



i=i fee/i 



and again, following the Ergodic Theorem, the right side of the above 
inequality is upper bounded by C4|j'0 — "011 a.s. 
• By the Cauchy-Schwarz-Bunyakowski inequality 



n < 



1 

-E 

i=\ 

< C5||^-v^||-V||w*w, 

T). ^ ^ " 



WW,; 



Now, the norm of the symmetric matrix W'W^ is given by the absolute 
value of of the largest real eigenvalue, which in the present case is 



tr(W*W,) + ^tr(W*W,)2 _ 4det W*W, 
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Since det W*Wi is positive, 



tr(W,*W,) + ^tr(W*W,)2 „ 4det W*Wi 
2 

Since tr(W*WO = n, + J^keu ^k^ ^hen 

^ in 1 ^ 

iV||w*w,||<i + -Vy, 



< tr(W*Wi). 



fc=i 



Thus, the last term of (5.6) is smaller than C^Wif,' — -011. 
We thus reach the conclusions that there exists a constant C such that 

|/i„(V)-/i„(^')l<C|l^-^'|i, a.s. 

this implies that hn is an equicontinuous series. In order to return to {Yn} we 
note that 

1. pAYo\^i) 



n^°^p^KV,a:?) 



<£. 



from where we have 



and then, adding over x" 



from where it follows 






<e. 



D 



Proof of Lemma 4.2. 



The first part follows from proposition 2.9 of [15]. 

To prove the second part, we follow Leroux lemma (see [3], Lemma 8, p. 21), 
for every ■0 G ^mo such that p^ ^ p^^ , there exists a neighborhood O^ and 
e > tal que inf^go^ ^(V'mo,V') > ^- Since, however, ^,„„_i is compact, it is 
subcovering by a finite union O^^ , • . • , Oy,^ (each one of them is associated to a 
Ei > 0); hence, 

inf i^(i/','0o) > min inf KI^^iPq) > mine^ > 0. 

In order to carry our the third part of this proof, let {Bs{^p) : ip e '^m} 
be a covering of ^m by open balls. Due to the compactness of ^m there exists 
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a finite subcovering _Bj('0i), . . . ,B^{ipi). Thus, for every ip € '^m there exists 
i G {!,...,/} such that from Lemma 1.1, 



fogp^.(yi"|yo = yo) - \ogp^(Y{^\Yo = yo) 



<£. 

D 



Proof of Theorem 4.1. 



We use the fact that Y{fh{n) < ttiq i.o) < X^mli P(^("-) = rn). We prove 
that P(m(n) = m) = 0. Indeed, 

F{fh{n) = m) 

< P( sup logp^ -pe7i(7i, m) > fogp^^^^ -pen(n,mo)) 

< P( sup \ogp^ >logpji,^^^ ~ pen{n,mQ) + pen{n,m)], (5.7) 

since V' G ^m according to Lemma 4.2 there exists I < i < I such that logp0^^ < 
ne + logp^^; hence it foUows from the (5.7) that 

P(m(n) = m) < PI maxlogp^.^ > logpw,„ — pen{n,mQ) — ne] 

^ y^p/ logPiA. -logPV-^o ^ pen(n,mo) 

and again, according to Lemma 4.2, 

hm ^ = -K{ipi,^o) 

n — *oo 71, 

and by hypothesis hm,i_+oo ^'^" "" — from where it follows that, 
F{m{n) = m) < ^ P (e < Ki^P^tPo) < e) = 0. 

Proof of Theorem 4.2. 

Let us define the set 

and 

nm YtPfcYj. 
A„,m = c„,(n) + dm(n) + e,„(7i) + -— log , + pen[n, mo) - pen{n, m). 

Z Y^iJfeYfc 



n 
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Wc note that 

P^o {m{n) >ma)< ^ P.0„ {in{n) > toq, ^„) + P^o(^«) 

m>mo 

and 

^0^0 {fh^m, An) 



(a) 
< 



V.„(logP0„„(^i"|yo = 2/o)< sup logpv,„(n"l^o = yo) 

+pen{n, toq) — pen{n, m), A„) 



< P^-^o log 2 < /^ Ar, 

1 log -fV^ TvT- - ^'"•"' ^" 



X °/.yn_'l g^^(n" = yi)dy^ 



' ■m{m+ 1) , , , , , ,, , , , 

- *'^P ^ log(n) + Cm{n) + din) + e,„(n) 



H h pen[n, too) — pen[n, m 

where (a) is a consequence of the way the PML estimator is defined (4.2) and 
(b), of Theorem 3.1. 

In what follows we consider the coefficient Yj P^ Y/^, / Yj Bfe Y/j. . Conditions 
with respect to F/' ~ y" and x", as the model is both linear and Gaussian (3.2) 
then Y^^PfcY/, has a x^("fe,7) distribution, where 7 = (l/2)6if^W*PfcW6'fc is 
the non-centrality parameter; further, we assume that P;; has a maximum rank. 
Moreover, x^(nfc, l/26'^W*PfeW6'fc) can be approximated by a Xr having the 
same mean and the same variance with r = {uk + 27)^/(71^ + 47). For the 
denominator, if assume B^ to have full range, then Yj BfeY/^. distributes Xn 
(see Searle [2(J],§2, pags. 49-53). 

On the other hand, 

j=lr-^Y\Wk{WlW^)-'M^WiYj, « lr-2||Y*^W,||2, 

substituting in r, wc have: 

_(7^, + 27)2_ (n,+r-2||Y*^Wfc|| 



{uk + 47) {uk + 2r-2||Y* WfeP) V2r2 



= o — ^ a.s 
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Since we notice that Uk/n -^ Afc a.s (SI), 



>tn\ ~ P I ^„.... < 

"'=/2-l(l +„)-("'= +'-)/2rfu 






rmr(§;.o 



2 • ■ u- 



p^nid^r 



< 



1 



and choosing i„ = n'^^k^^ we have that 



> tn < 



/.^^^^^' y ./^x.^^^TT^ 



We have proven that Yj Pa;Y/j,/Yj BfcY/^ is bounded in probabihty with 



a rate n^^k^ . We still have to determine the bounds for A„m. 
Using the definition of function pen{n, m) we have 

m-i i(j : T\ ™-i 

A„,„ < --(m-mo)log(n) - ^ logn- ^ Q(n) 

Z^771o + l / — mo + 1 

'J^^ 3m(/)(n) log(Tt) 

— > e/(n) + mnlogn H — 5 

-^-^ 4AfcT^ 

/— mo + l 

mo(™o + l) ,, s, TO(m+l) ,. ., 
+ ^ 0(71) log n (j)(n)logn 

for 771 = 777o + 1 we have that 

m — 1 w . _. X ?7l — 1 T7l — 1 

l—rriQ + l /— mo + 1 /— mol 



, ,, , 3777 

(7770 - m)(mQ + m) + mo + — — - 

AXhT^ 



(t>{n) log 77 



We select r^ = ^ as: 
Anrn < - - {m - mo) \og{n) + 

< --{m-mo)\og{n) 

Therefore: 

P^^^ (to ^ m, An) < e(-f(™-™o)i°g(")) ^ ©(n-''/^), 

andPv,o(^^) = 0(e-"^°s") hence P^^ (m(77) > toq) = 0(n-P/2 + e-"'°s") thus, 
in view of Borel-Cantelli's Lemma 777(77) < mo a.s. D 
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